Simultaneous branch and warp interweaving for sustained GPU performance

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perceptual Performance Impact of Gpu-based Warp & Anti-aliasing for Image Generators

In 2012 the U.S. Air Force School of Aerospace Medicine, in partnership with the Air Force Research Laboratory (AFRL) and NASA AMES, constructed the Operational Based Vision Assessment (OBVA) simulator. This 15channel, 150-megapixel display system remains one of the highest resolution displays ever built. One of the original goals for the simulator was to implement a distortion correction syste...

متن کامل

Simultaneous Multithreading’s Real Effect on Cache and Branch Prediction Performance

متن کامل

Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU

Stochastic simulations need multiple replications in order to build confidence intervals for their results. Even if we do not need a large amount of replications, it is a good practice to speed-up the whole simulation time using the Multiple Replications In Parallel (MRIP) approach. This approach usually supposes to have access to a parallel computer such as a symmetric multiprocessing machine ...

متن کامل

RLWS: A Reinforcement Learning based GPU Warp Scheduler

The Streaming Multiprocessors (SMs) of a Graphics Processing Unit (GPU) execute instructions from a group of consecutive threads, called warps. At each cycle, an SM schedules a warp from a group of active warps and can context switch among the active warps to hide various stalls. Hence the performance of warp scheduler is critical to the performance of GPU. Several heuristic warp scheduling alg...

متن کامل

Branch prediction and simultaneous multithreading

In this paper, we examined the behavior of three of the best performing branch prediction strategies while executing several threads of instructions simultaneously. We studied the impact of the addition of one Return Address Stack per hardware context. We showed that a 12-deep stack per thread is suucient to enhance greatly the accuracy of branch prediction while adding a minimal implementation...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM SIGARCH Computer Architecture News

سال: 2012

ISSN: 0163-5964

DOI: 10.1145/2366231.2337166